Experiments with a Noun-Phrase driven Statistical Machine Translation System

نویسندگان

  • Sanjika Hewavitharana
  • Alon Lavie
  • Stephan Vogel
چکیده

This paper presents a noun phrase driven two-level statistical machine translation system. Noun phrases (NPs) are used as the unit of decomposition to build a two level hierarchy of phrases. English noun phrases are identified using a parser. The corresponding translations are induced using a statistical word alignment model. Identified noun phrase pairs in the training corpus are replaced with a tag to produce a NP tagged corpus. This corpus is then used to extract phrase translation pairs. Both NP translations and NP-tagged phrases are used in a two-level translation decoder: NP translations tag NPs in the first level, where NP-tagged phrases match across NPs to produce translations in the second level. The two-level system shows significant improvements over a baseline SMT system. It also produces longer matching phrases due to the generalization introduced by tagging NPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature-Rich Statistical Translation of Noun Phrases

We define noun phrase translation as a subtask of machine translation. This enables us to build a dedicated noun phrase translation subsystem that improves over the currently best general statistical machine translation methods by incorporating special modeling and special features. We achieved 65.5% translation accuracy in a German-English translation task vs. 53.2% with IBM Model 4.

متن کامل

Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calcul...

متن کامل

Handling Multiword Expressions in Phrase-Based Statistical Machine Translation

Preprocessing of the parallel corpus plays an important role in improving the performance of a phrase-based statistical machine translation (PB-SMT). In this paper, we propose a frame work in which predefined information of Multiword Expressions (MWEs) can boost the performance of PB-SMT. We preprocess the parallel corpus to identify Noun-noun MWEs, reduplicated phrases, complex predicates and ...

متن کامل

مدل ترجمه عبارت-مرزی با استفاده از برچسب‌های کم‌عمق نحوی

Phrase-boundary model for statistical machine translation labels the rules with classes of boundary words on the target side phrases of training corpus. In this paper, we extend the phrase-boundary model using shallow syntactic labels including POS tags and chunk labels. With the priority of chunk labels, the proposed model names non-terminals with shallow syntactic labels on the boundaries of ...

متن کامل

Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007